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Abstract 


Confidence  intervals  for  the  population  median  based  on 
interpolating  adjacent  order  statistics  are  presented.  They  are 
shown  to  depend  only  slightly  on  the  underlying  distribution. 

A  simple,  nonlinear  interpolation  formula  is  given  which  works 


well  for  a  broad  collection  of  underlying  distributions. 
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1.  Introductior. 


Suppose  X, , . . . ,X  is  a  random  sample  of  size  n  from  a 
1  n 

distribution  with  absolutely  continuous  distribution  function  F(x-6) 

and  density  f  (x-S) .  Further,  suppose  F(0)  -  i  ,  uniquely,  so  that 

5  is  the  unique  median;  no  shape  assumption  is  imposed  on  F  . 

Let  X  S  ...  S  X  denote  the  order  statistics.  Then  the 
(1)  in) 

interval  (d) (n-d+1) ^  ^  simple  distribution-free  confidence 

interval  for  5  .  The  confidence  coefficient  y  =  1  -  2P  (S<d)  where 
S  has  a  binomial  distribution  with  parameters  n  and  i  .  If  S 
denotes  the  sign  test  statistic  for  testing  H  50  =0  versus  H  :9  ^  0 

U  A 

then  the  interval  corresponds  to  inverting  the  acceptance  region  of 
a  size  a  =  2P(S<d)  test.  See  Hettmansperger  (19S4a,  Section  1.5). 

This  confidence  interval  is  quite  versitile  since  it  makes  no 
shape  assumption  on  the  underlying  distribution,  is  easy  to  compute, 
and  requires  only  a  binomial  table  to  establish  the  confidence  co¬ 
efficient.  In  Hettmansperger  (1984b)  we  recommend  using  the  interval 
to  form  the  notches  in  a  notched  box  plot  and  construct  simple  two 
sart^Jle  tests  based  on  comparing  these  intervals.  The  Minitab 
computing  system  uses  these  intervals  in  their  box  plot  routine. 

Because  of  the  discreteness  of  the  binomial  distribution, 
for  small  to  moderate  sample  sizes  the  available  set  of  possible 
confidence  coefficients  is  rather  sparse.  In  this  paper  we  consider 
the  problem  of  interpolati.ng  adjacent  order  statistics  to  form 
confidence  intervals  with  intermediate  values  of  the  confidence  co¬ 
efficients.  The  interpolated  intervals  are  no  longer  distribution 
free  i.n  general;  however,  we  will  show  i.hac  the  rcnfidence  cc- 
iffiiient  depends  only  slightly  on  the  -i-nder ly i.tg  F  for  a  broad 


interpolation  is  not  appropriate,  and  we  will  provide  a  simple 
interpolation  formula  that  works  well  in  most  practical  situations. 


2.  Properties  of  the  Interpolated  Intervals. 


Suppose  and  tX  ,X  ]  are 

=  1  -  and  =  1  -  confidence  intervals  for  9  , 

respectively.  Then,  from  the  binomial  distribution,  we  have 


d+1 


[dj 

I12J 

U) 


or 


'd+1 


^d  -  2 


This  links  the  successive  intervals  based  on  d  and  d+1 


Define,  for  0  ^  X  <  1  , 


\  =  (l-X)X^^j  +  X 


X„  *  (l-X)X,  ,  ,,  +  X  X,  ,, 

U  (n-d+1)  (n-d) 


and  let  y  be  the  confidence  coefficient  for  [X  ,X  ]  .  Then 

I4  u 


'd+1 


Y  <  Y, 


and  we  wish  to  establish  the  connection  between  X  and 


Y  .  Given  y  ,  we  will  present  a  simple  interpolation  formula  for 
finding  ;  see  (7)  in  Section  3.  Note  that 


Y  =  P(X^S0SX  ) 

Li  U 


1  -  p  (9<x  )  -  p  (e>x„) 

irf  U 


1  -  a  -  a 
L  U 


Proposition  1.  let  "  =  Then 


-  (n-d)'"!  /"(F(-cy)]^(l-F(y) j"*^‘^dF(y) 


■'o 


d-t-1 

“u  “  2 


Proof ■  Without  loss  of  generality  let  0*0.  Let  D  denote  the 
set  {  (x,y)  :  -  <»><x<y<<»,  {1-A)x  +  Ay  >  0}  .  Then,  denoting 
the  joint  density  of  ^(d+1)  f  (x,y)  ,  we  have 


=  P(0<XJ 

L  Li 


=  IqI  f(x,y)dxdy 

-  iXy  ,a-l,  ,  (U'-ill 

=  13rrnThn77  3  , 


The  formula  for  oi^  now  follows  from  the  fact  that 


P  (X  >0) 
'  (d+1)  ' 


The  formula  for  a  follows  in  a  similar  way. 


Proposition  2.  Suppose  f  is  symmetric  about  0  .  Then 


(ii)  if  =  i  It  follows  that 


'  i’  r  1  ^ 
n-a  nr  *  1 1 

"n”-d !  *2  I 


Proof .  Part  (i)  follows  at  once  from  Proposition  1  since 


F(c;y)  =  1  -  F(-ay)  .  In  part  (ii) ,  note  that  A*i  implies 
a“l»  so  that  we  have 


/"iFC-yjl^^tl-FCy)  l^'^^'^dFCy)  =  /"[1-F  (y)  ]”"^dF  (y) 
0  0 


1 

n 


The  formulas  in  (ii)  now  follow  from  the  result  for  o  in 

X* 

Proposition  1  and  the  result  in  (1) . 


by 


Given  a  desired  y  ,  define  the  interpolation  factor  1 


I 


(5) 


Note  that  1  depends  upon  X  through  y  so  we  will  write  1  (X) 
when  it  is  necessary  to  express  this  dependence. 

Proposition  3.  Let  o  *  X/(l-X)  .  Suppose  f  is  symmetric 
about  0  .  Then 

(i)  I(X)  =  1  -  (n-d)2"  /"[F(-oy)]‘^(l-F(y)]"’^"^dF(y)  (6) 

0 

1(0)  *  0,  I(i)  =  d/n  and  1  (X)  -►  1  as  X  -*•  1  . 

(ii)  If  F  is  sufficiently  regular  so  that  differentiation  can 

be  caried  out  under  the  integral  and  if  f'(x)  >  0  for  x  ^  0 
then  I(X)  is  a  continuous  and  strictly  increasing  convex 


function  of  X  . 


Proof.  Using  y  =  l-2a  ,  from  (5)  and  (1) ,  we  have 

fnl 

I  =  [ot  /2  -  a,/2]/  j  —  .  The  formula  for  I  now  follows  from 

ci+l  d  (_dj  (^2 

Propositions  1  and  2  by  substitution.  The  limit  follows  by  the 
dominated  convergence  theorem  since  F(-oy)  -►  0  as  a  ”  for  y  >  0 
Part  (ii)  follows  by  verifying  that  l'(X)  >  0  emd  l"{X)  >  0  . 

This  proposition  shows  at  once  that  linear  interpolation 

is  inappropriate.  If  we  used  linear  interpolation  then  I  (i)  must 

be  equal  to  i  .  However,  I(i)  =  d/n  which  is  less  than  i  . 

For  example  with  n=10,  d=2,  y,  =  .9786,  y.  ,  =  .8907  ,  we  have 

ct  a+1 

1  =  d/n  =  .2  and  y  =  .9610,  corresponding  to  X  =  i  .  Linear 
interpolation  yields  .9347,  As  a  crude  approximation  take 
d  i  n/2  -  2n  ^ /2  +  .5  ,  where  Z  is  the  a^/2  quantile  from  the 
standard  normal  distribution,  then 

1  A  i  .  2_  ^  I  <1 

n  2  ,  J  2n  2  ’ 

Further,  since  I(i)  =  d/n  ,  we  have  an  additional 
distribution- free  interval  when  f  is  symmetric.  This  helps  in  the 
search  for  an  interpolation  formula  since  the  curve  for  I (X)  must 
pass  through  the  ordinates  0  ,  d/n  ,  and  1  ,  at  least  for  a 

symmetric  f  .  In  principle,  given  X  j*  i  we  would  need  to  specify 

F  to  find  I  according  to  (6) .  In  practice,  we  wish  to  specify  I  , 
through  y  ,  and  find  X  .  From  Part  (ii)  there  exists  a  strictly 
increasing  concave  curve  that  relates  X  to  I  but  is  generally 

impossible  to  find  because  of  the  complexity  of  (6) . 

In  the  next  section  we  find  I(X)  for  several  different 
distributions.  We  show  that  the  curves  are  quite  close  to  one  another 


We  then  select  one  which  yields  a  particularly  siitple  formula  for  I(X) 


(it  results  when  F  is  the  double  exponential  distribution) ,  and 
invert  it  to  provide  a  formula  for  X  in  terms  of  I  . 

3.  Examples  and  a  Recommendation 

In  this  section  we  provide  explicit  formulas  for  I  (X) 
for  underlying  uniform  and  double  exponential  distributions  and  an 
asymmetric  distribution  formed  by  piecing  together  double  exponential 
and  logistic  distributions.  We  also  provide  numerical  results, 
based  on  numerical  integration,  for  the  normal  and  Cauchy  distributions. 
The  numerical  integrations  were  carried  out  using  the  method  of 
Donker  and  Piessens  (1975) , 

Numerical  examples  are  given  in  Tables  1  and  2. 

Example  1.  The  double  exponential  distribution.  The  distribution 
function  is  given  by  F (x)  =  2  ^exp  (x)  if  x  <  0  ,  and  1-2  ^exp (-x) 
if  X  S  0  .  In  (6)  replace  F(-ay)  by  1  -  F(cy)  and  then 

/  (l-F(oy)]  [l-F(y)]'^  dF  (y)  =  I— |  /  exp{ -y  (da+n-d)  }dy 

0  (.•‘jo 

=  [n+d(o-l)] 

Hence,  from  (6),  I(X)  =  dc/ [n+d (o-l) ]  .  Recalling  that  a  =  X/(l-X) 
we  f ind 

^  (n-d)I 
'  d  +  (n-2d)I  ■ 


(7) 


As  a  final  remark,  note  that  if  we  take  a  curve  fitting  approach 


and  try  to  find  a  concave  curve  through  (0,0),  (d/n,i)  and  (1,1), 
then  (7)  results  when  we  fit  A  =  al/(b+cl).  Polynomial  fits  were 
not  very  satisfactory  and  (7)  represents  a  simple  ratio  of  linear 
polynomials. 


Example  2.  The  uniform  distribution  on  (-1,1) .  The  distribution 
function  is  given  by  F  (x)  =  0  if  x  -1,  2  ^(x+1)  if  |x|  ^  1  , 
and  1  if  x  >  1  .  Note  also  that  F  (-oy)  =0  if  y  >  o  ^  , 

2  ^(-ay+1)  if  -o^Sy^a^,  amd  1  if  y  <  -o  ^  .  Now, 
using  a  binomial  expeinsion  on  (l-cy)'^  and  the  beta  integral,  we  have 
when  0^21  (A5i)  , 


(•<*>  d  n-d-1  1 

/  [F(-ay)  ]“[1-F(y)]"  ""dF  (y)  =  i 
0  \  ^ 


d 

J  (1-oy)  (1-y) 

0 


n-d-1 


dy 


n! 


j=0 


n 

d-j 


(8) 


When  0 


-1  < 


1  (ASi)  we  have 


r[F(-oy)]‘^[l-F(y)]""'^“^dF(y)  *  ij  J 


°  (l-ay)‘*  (l-y)""‘^"^dy 

0 


-ifl]"  ,.,d,,  -1, n-d-1 

=  a  2]  J 


-1  1 
=  a  - 


d!  (n-d-1)  ! 
ni 


n-d-1 


I  (-a"^) 

j=0 


dt 

j 


n-d- 


(9) 


The  formulas  (8)  and  (9)  can  then  be  used  to  calculate  1(A)  for 
various  values  of  A  . 


Example  3.  Pieced  together  double  exponential  and  logistic 

distributions.  The  distribution  function  is  given  by 

F(x)  -2  ^exp(x/T)  if  X  <  0  ,  and  [1  +  exp  (-x)  ]  ^  if  x  S  0  . 

If  T  =  2,  the  density  function  is  continuous,  the  first  quartile 

is  -1.39,  and  the  third  quartile  is  1.1.  If  x  =  10,  there  is  a 

jump  in  the  density  at  0  ,  the  first  quartile  is  -6.9,  and  the 

third  quartile  is  1.1.  Because  of  the  asymmetry  we  must  evaluate 

a  and  a  separately.  See  formulas  (2) . 

L  U 


We  have 


r[F{-ay)]‘^[l  -  F(y)]''"‘^‘^dF(y) 

0 

=  /“exp[-y  —  +  n-d]]{l  +  exp  (-y)  }  ^dy  (10) 

0  I  ^  J 


/  (1  -  F(-ay)  ]^[F(y) 


.  n-d-1 


dF  (y) 


fl]^  ^  1  r®  f  n-d] , , ,  .-d, 

=  t-j  —  J  e;q)[-y  d  +  J{1  +  exp(-y)  /  dy 


By  making  the  change  of  variable  u  =  exp(-y)  both  integrals  can 
be  reduced  to  the  form 


N-1  -M  fll  ” 

u  ( 1+u)  du  =  j  I 


M-N-l] f  1 


j  ii's 


-r  (N+j) 


See  Gradshten  and  Ryzhik  (1965  p.285).  Using  (12),  (10)  can  be 


written  as 


od 

i2i 


n-d  ® 

I  ( 

j=0 


j!2^)-'r2i[2i.ij 


fad  ,  ,  Tad  . } 


and  (11)  can  be  written  as 


fl] 


OT 


V 

L 

j=0 


n-d 

OT 


+  1 


,  n-d 

d  +  -  + 

OT 


These  infinite  series  are  straightforward  to  approximate.  For 
large  j  the  {j+l)st  term  is  roughly  one-half  of  the  jth  term. 

This  means  that  the  tail  of  the  series  is  roughly  equal  to  the 
last  term  retained.  The  examples  are  calculated  accurately  to  four 
places . 


For  the  normal,  logistic  and  Cauchy  distributions  we 

used  numerical  integration  as  mentioned  previously.  As  an  illustration 

we  take  n=10 ,  d=2 ,  y,=.9786,  y,  =.8907.  In  Table  1  we  show 

d  d+1 

1(1)  for  A  =  .1  (.1)  .9  and  Y  =  Y^  -  (Y-,-Yj  ,)I  in  parentheses. 

a  a  d-fl 

Linear  interpolation  results  are  provided  for  comparison. 

-  Table  1  about  here  - 


t 


Note  in  Tcible  1  how  close  y  is  for  all  A  an  i  for  the  § 

spread  of  distributicns  uniform  to  Cauchy.  The  logistic  distribution 
was  indistinguishable  from  the  normal  distribution  so  it  was  not 

included  in  the  table.  • 


If  the  underlying  distribution  can  be  supposed  to  be  symmetric 

then  we  recommend  using  formula  (7)  to  determine  A  from  I  . 

For  the  example  considered  here,  if  we  want  a  95%  confidence  interval 

then  from  (5)  I  =  .3254  and  from  (7)  \  =  .66  .  Hence 

X.  =  .34X  +  .  66X  and  X  =  •34X,„^  -t-  .66X,_^  provide  the 

u  (2)  (3)  U  (9)  (8) 

95%  confidence  interval. 


» 


I 


In  Te±ile  2  we  illustrate  the  n=10 ,  d=2  case  for  the 

asymmetric  distribution  in  Example  3.  The  table  shows  the  lower 

and  upper  tails,  a  and  a  ,  and  then  compares  y  *  1  ~  “t  “ 

Li  U  L  U 

to  the  Y  calculated  from  the  double  e)q)onential  example. 

-  Table  2  about  here  - 

Table  2  shows  that  mild  asymmetry  does  not  matter  much 
and  we  would  still  use  (7).  In  the  pathological  case,  t=10, 
the  results  were  surprisingly  close  even  though  the  two  tails 
differed  by  quite  a  bit.  Further,  for  this  extreme  case,  linear 
interpolation  is  at  least  twice  as  far  from  y  as  (7) . 

Hence,  we  conclude  that  for  most  practical  sitxiations 
(7)  provides  an  accurate  interpolation  formula. 
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Table  1.  Interpolatior.  Factors  and  Confidence  Coefficients. 


A 

* 

DE 

U 

N 

C 

.1 

.027 (.976)** 

.023 (.977) 

.025  (.976) 

.026(.976) 

.2 

.059(.973) 

.052 (.974) 

.055(.974) 

.057  (.974) 

.  3 

.096 (.970) 

.091  (.971) 

.092 (.971) 

.094 (.970) 

.4 

.143 (.966) 

.137  (.967) 

.139(.966) 

.141  (.966) 

.5 

.200  (.961) 

.200 (.961) 

.200  (.961) 

.200  (.961) 

.6 

.273 (.955) 

.282 (.954) 

.280 (.954) 

.275  (.954) 

.7 

.369  (.946) 

. 396  (.944) 

.388 (.944) 

.373 (.946) 

.8 

.501  (.935) 

.553  (.930) 

.536(.931) 

.506 (.934) 

.9 

.692 (.918) 

.753  (.912) 

.736(.914) 

.694 (.918) 

DE  =  Double  exponential,  U  =  Uniform,  N  =  Normal,  C  ■»  Cauchy 


The  number  in  parentheses  is  the  confidence  coefficient. 


Table  2.  Confidence  Coefficients  in  Asymmetric  Case 


T  *  2 


X 

“l 

Y 

DE 

.1 

.0119 

.0117 

.976 

.976 

.2 

.0133 

.0130 

.974 

.973 

.3 

.0151 

.0146 

.970 

.970 

.4 

.0173 

.0165 

.966 

.966 

.5 

.0201 

.0189 

.961 

.961 

-  -  = 

.6 

.0237 

.0220 

.954 

.955 

.7 

.0284 

.0262 

.945 

.946 

.8 

.0346 

.0320 

.933 

.935 

-  - 

.9 

.0430 

.0407 

T  =  10 

.916 

.918 

X 

“l 

Y 

OE 

.1 

.0163 

.0109 

.973 

.976 

•*.*•*. 

.2 

.0220 

.0112 

.967 

.973 

.3 

.0275 

.0115 

.961 

.970 

.4 

.0325 

.0120 

.956 

.966 

.5 

.0372 

.0126 

.950 

.961 

.6 

.0414 

.0135 

.945 

.955 

—  -  < 

1  '  -  .  ^  ^ 

.7 

.0452 

.0149 

.940 

.946 

.8 

.0487 

.0175 

.934 

.935 

.9 


.0518 


.0236 


.925 


.918 
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